Copyright Issues
Copyright Working Group Report
Chair:
Eric Kansa
Participants:
Jeanne Altmann, Eric Delson, Tom Moritz
The Copyright Working Group discussion focused on the legal and social norms that govern the ownership of anthropological research data. Issues around privacy and cultural sensitivities related to certain classes of anthropological information may sometimes require access and use restrictions for ethical management and curation. However, these ethical considerations, while essential in shaping future digital curation policies, were not the main focus of discussion for the Copyright Working Group. Instead of focusing on the privacy needs of anthropological stakeholders (especially human subjects in research), the Copyright Working Group was mainly focused on the ownership claims and interests of professional researchers.
While our group saw very lively discussion and debate, we came to little consensus about the best way to shape data ownership policies. Though all members of the group agreed that preservation and dissemination of primary data are important priorities and could greatly improve the research process, we debated the topic of allowing or requiring unrestricted, anonymous access to research data. Important discussion themes included:
Library Perspective:
Libraries have a strong ethical tradition of sharing knowledge and information as broadly as possible. One participant cited Thomas Jefferson and his belief that the field of knowledge is the common property of all mankind. Withholding data works against core scientific principles, because withheld data makes it difficult or impossible to falsify claims.
Concerns over “Free-Riders”:
A major concern over data sharing and ownership relates to the potential for benefiting one class of researchers over another. For instance, researchers who engage in fieldwork and data collection often spend a significant amount of time writing grants and once they obtain adequate funding, spend more time gathering data (time that could be spend publishing). On the other hand, more theoretically inclined researchers who spend less time and effort funding and executing fieldwork are often able to produce publications faster. Because they have more to show for their efforts, theoreticians may be unfairly advantaged for tenure and promotion, particularly if high quality research data is easily available. Thus, if data sharing policies do not take into account the potential for data “free-riding”, field researchers may suffer.
Options Discussed:
Discussion over credit and incentives often touched back on concerns over “free-riding” and adequate recognition for the contributions of researchers who invest so much effort and face so many risks in data gathering. Elinor Ostrom’s research into the sociology of common pooled resources relates to the free-rider concern. Her findings indicate that some people who could contribute to a common data resource will not participate if there are any free-riders.
All working group participants agreed that data sharing should be recognized by the profession. However, the working group differed in their opinions on the best mechanisms and approaches to promote recognition and combat free-riding. These opinions included:
Publication Norms: Some working group participants regarded reliance on professional social norms as the best way to provide the proper credit and incentives for researchers to share data. Researchers publish their insights, analyses, and to some extent their data through journals and other venues because they feel confident that they will be credited for their efforts. Citation norms already exist, and researchers routinely cite each other’s published presentations of data. If data sharing takes the form of publishing (“data sharing as publication”) then these established norms and forms of professional recognition could help provide needed incentives for data sharing. Data sharing in the context of publication also enhances the value of shared data, because datasets need extensive documentation and explanation for reuse.
Restricting Contracts / Agreements: Other working group participants believed that stronger measures based on access controls are required to combat free-riding. Access to research data should be provided on a conditional basis. Researchers that produce data may want to know who is accessing their data and why. In addition, some researchers may want to protect their data from misuse by anti-scientific (including commercial or religious) agendas. Because of these concerns, access to data should require some combination of login and identification and / or a “click-through” agreement to proper uses of a given dataset.
The working group came to no consensus about the relative merits and risks of these various options. Some working group participants favored much more open forms of data dissemination and others saw access and use restrictions as an important safeguard. The various conflicting points of view are as follows:
Issue / Options |
Arguments for Controls |
Arguments against Controls |
Identification of users |
(1) Researchers who publish datasets want to know how their data are used and by whom. Identifying individuals who request data represents a minor (and non-onerous) form of compensation for sharing data. It can also help guard against plagiarism. (2) Given the privacy and ethical sensitivities of many classes of anthropological data, identity management systems will be required. Thus, this poses no extra complication. |
(1) There are privacy and academic freedom concerns to consider. Libraries traditionally regard information retrieval requests as sensitive private data, and destroy all record of such transactions once a book is returned. Anonymous requests for data offer better privacy protections and academic freedom, especially considering that research designs and research questions may sometimes be revealed by requests for data. (2) Identity management makes data dissemination more costly to build and manage. |
Special click-through agreements |
(1) Protecting datasets from misuse is an important requirement and necessitates click-through agreements. (2) Click-through agreements and requirements for individual login are minor and not onerous. In practice, multidisciplinary research can cope with limited restrictions on the access and use of data. (3) If a data repository is large and well known, interested researchers will be drawn to it and search-engine discovery issues will be less of a problem. Adequate metadata description can make datasets visible for casual discovery. |
(1) Scholarly publications are already available in the “open literature” and can be used by all, even for potentially misguided commercial or religious applications. Trying to regulate use of scientific literature raises a host of difficulties and freedom of expression issues and runs counter to library ethics. Such restrictions should only exist where required to protect the security and privacy interests of human subjects. (2) Click-through agreements greatly complicate some research designs that may aggregate data from different sources. If individual sources come with different (and sometimes ambiguous or even contradictory) contractual obligations, they become less interoperable. For example, some applications including novel visualizations or analyses. These may require re-publication (in some form) of datasets obtained from many sources. If these sources restrict re-publication, uses become limited. Thus, such agreements could hamper some research designs, especially for multidisciplinary investigations. (3) Access restrictions and click-through agreements inhibit information discovery and use. Researchers will be less likely to find relevant information through casual browsing or through search engines. |
Open Data (anonymous open access, public domain, no use restrictions) |
(1) Researchers invest a great deal of time, effort, and talent in creating data. They also face significant professional risks (and more!) in producing data. This investment should be recognized and researchers should have (some) control how their data are used and by whom. Open publication of data is too risky because professional norms regulating the use of these data are too weak. (2) The public already benefits through expanding scientific knowledge. There is little real public interest in accessing primary data. (3) Not all forms of publication are equally valued. A published dataset, even with many citations, is less professionally valuable to a researcher than a more mainstream article published in a prestigious journal. |
(1) Recognition for the contribution of researchers should come through open publication and citation norms. Access restrictions, special agreements, or other encumbrances are not needed except to for privacy and security concerns relating to sensitive information. Moving walls, that release data openly after a few years, may give data creators adequate time to exclusively benefit from their data, while still insuring long-term accessibility. (2) Research is supported by significant public investment, either directly through federal granting programs or less directly through philanthropic sources. Because open data sharing can improve the quality and pace of science, the public interest is best served by reducing access barriers. (3) In addition to social norms, technologies can help promote professional recognition for data publication. If data are published with adequate citation systems, impact measures can be developed. Widely cited, “seminal datasets” can be recognized. |
It is also important to mention what was not discussed. Our working group did not specifically address the option of ownership and access restrictions over data as a means for cost-recovery. This was a topic of other working group discussions. In addition, the nature of envisioned uses for data did not receive much discussion. There is great need for additional exploration of how datasets can be used and the implications of various access controls for different use scenarios. For example, privacy concerns are increasingly difficult to address through “de-identification” measures. Again and again, researchers have been able to infer personal identities in de-identified datasets through reference to other public datasets. The failure of de-identification to offer much privacy protection makes access restrictions all the more important for sensitive anthropological information.
On the flip side, there are many use cases for research data that almost require “open data” approaches to dissemination. For example, a researcher may develop a compelling and analytically useful way to visualize shifting social relationships in primate groups. This visualization may draw upon several datasets, and if one or more of those datasets have access and re-publication restrictions, public deployment and presentation of the visualization may be prohibited. Many software approaches supporting visualization make it easy to extract source data. Enforcing data protection measures in a networked environment where there are great demands for aggregation and reuse of data is very difficult.
Institutional and National Claims:
Researchers often work in complex contexts where several organizations and even governments may make various ownership claims over data. One participant recounted her experiences where three different entities ranging from a national government (controlling the research site), a European research institute, and her own university made various ownership or control claims over data. These claims become increasingly difficult to manage, especially since, in this particular context, raw data was of little value and needed significant investment in cleanup, annotation, and other processing before they could provide a useful basis for analysis. What credit, recognition, and ownership rights should be given to the researchers who contributed these nontrivial efforts to improving the quality and usability of raw data?
Copyright Complexity:
Furthermore, US copyright law and certain other laws in foreign jurisdictions add more complexity to data sharing and ownership. In the US, copyright does not apply to “facts” (or “ideas”); it only applies to fixed expressions having some minimal level of creativity. The dividing line between copyrightable “expressions” and public domain “facts” is very ambiguous. This ambiguity applies equally as much to metadata as it does to data. In our working group discussion, we explored how certain forms of metadata, particularly metadata describing the meaning, methods, constraints and limitations of a dataset would likely be covered by copyright. Other forms of metadata, particularly bibliographic metadata and technical metadata (such as those describing file formats, checksums and collection structures), would be considered more factual and public domain. Thus, the copyright status of much content in databases compiled by researchers and the metadata about those databases must be considered on a case-by-case basis. Furthermore, the European Union has database protection laws that protect compilations of data (including “factual data”). Data sharing and interoperability with EU partners will require addressing EU data protection laws.
Allowing any use involving duplication and modification of copyrighted works requires some form of license that articulates certain permissions, restrictions and requirements. Licenses come in many varieties, and one often sees informal copyright licenses on scholarly materials stating something like “for educational purposes only.” Informal or custom licensing of content may create interoperability problems, because many sets of ambiguously expressed permissions and restrictions for reuse may be difficult to manage. For sharing copyrighted works, use of standard Creative Commons licenses helps to overcome these compatibility and complexity problems. Because these licenses are standardized, they simplify managing and aggregating large sets of commonly licensed content. Creative Commons licenses are also expressed (in RDFa) as standard metadata. This helps with data interoperability goals because the metadata allow users to discover content that are legally interoperable. If datasets have additional “click-though” requirements imposed on them, these requirements should also be expressed in standard metadata.
However, the ambiguous copyright status of much database content together with sui generis legal protections like the EU database laws make scientific data-sharing complex with reference to common baseline standards. It is not clear if Creative Commons licenses could apply to many datasets. Creative Commons considered and ultimately rejected an approach which would have mandated adherence to a single license; put simply, this approach, which implicitly builds on intellectual property rights and the ideas of licensing as understood in software and culture, is difficult to apply in scientific uses. Therefore, Creative Commons, through its science division, Science Commons, is laying out principles for open access data and a protocol for implementing those principles. Creative Commons recently released the CC Zero protocol to be applied to scientific datasets. CC Zero is essentially a public domain declaration that provides a common baseline standard concerning the legal aspects of data interoperability.
Conclusions:
Access controls and ownership of researcher data remains a contentious issue. The questions and debates that arose in this working group need to continue. However, the perfect should not be the enemy of the good. While open data represent an ideal (to some workshop participants), open data may not be feasible or advisable in the short term. Social norms and expectations are continually evolving and it may take time for data publication to see adequate recognition. Thus, pragmatism toward these issues seems warranted.
However, a stance that offers some near-term pragmatism should not result in policies set in stone. The contested nature of this debate should be openly acknowledged as anthropological data sharing and preservation systems are rolled out. Debate should continue and will be better informed as some form of data sharing becomes more commonplace. Governance processes should be in place to continually revisit and adjust access and data-ownership policies into the future.
Griffiths, Aaron. 2009. “The Publication of Research Data: Researcher Attitudes and Behaviour.” International Journal of Digital Curation 4. http://www.ijdc.net/index.php/ijdc/article/view/101 (Accessed September 16, 2009).
Uhlir, Paul F., and Peter Schröder. 2007. “Open Data for Global Science.” Data Science Journal 6:OD36-OD53.
Narayanan, Arvind, and Vital Shmatikov. 2009. “De-anonymizing social networks.” IEEE security & privacy 9.
Ohm, Paul. 2009. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” University of Colorado Law Legal Studies Research Paper 09. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 (Accessed September 8, 2009).
Sweeney, Latanya. 2000. “Uniqueness of Simple Demographics in the U.S. Population.” Pittsburgh, PA: Carnegie Mellon University http://privacy.cs.cmu.edu/dataprivacy/papers/LIDAP-WP4abstract.html
Return to Chair Reports